Solving POMDP by On-Policy Linear Approximate Learning Algorithm

نویسندگان

Qiming He

Mark A. Shayman

Mark Shayman

چکیده

This paper presents a fast Reinforcement Learning (RL) algorithm to solve Partially Observable Markov Decision Processes (POMDP) problem. The proposed algorithm is devised to provide a policy-making framework for Network Management Systems (NMS) which is in essence an engineering application without an exact model. The algorithm consists of two phases. Firstly, the model is estimated and policy is learned in a completely observable simulator. Secondly, the estimated model is brought into the partially observed real-world where the learned policy is then ne-tuned. The learning algorithm is based on the on-policy linear gradient-descent learning algorithm with eligibility traces. This implies that the Q-value on belief space is linearly approximated by the Q-value at vertex over the belief space where on-line TD method will be applied. The proposed algorithm is tested against the exact solutions to extensive small/middle-size benchmark examples from POMDP literature and found near optimal in terms of average-discounted-reward and step-togoal. The proposed algorithm signi cantly reduces the convergence time and can easily be adapted to large state-number problems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Model-Based Online Learning of POMDPs

Learning to act in an unknown partially observable domain is a difficult variant of the reinforcement learning paradigm. Research in the area has focused on model-free methods — methods that learn a policy without learning a model of the world. When sensor noise increases, model-free methods provide less accurate policies. The model-based approach — learning a POMDP model of the world, and comp...

متن کامل

A (Revised) Survey of Approximate Methods for Solving Partially Observable Markov Decision Processes

Partially observable Markov decision processes (POMDPs) are interesting because they provide a general framework for learning in the presence of multiple forms of uncertainty. We survey methods for learning within the POMDP framework. Because exact methods are intractable we concentrate on approximate methods. We explore two versions of the POMDP training problem: learning when a model of the P...

متن کامل

Dialogue POMDP components (Part II): learning the reward function

The partially observable Markov decision process (POMDP) framework has been applied in dialogue systems as a formal framework to represent uncertainty explicitlywhile being robust to noise. In this context, estimating the dialogue POMDP model components (states, observations, and reward) is a significant challenge as they have a direct impact on the optimized dialogue POMDP policy. Learning sta...

متن کامل

Monitoring plan execution in partially observable stochastic worlds

This thesis presents two novel algorithms for monitoring plan execution in stochastic partially observable environments. The problems can be naturally formulated as partially-observable Markov decision processes (POMDPs). Exact solutions of POMDP problems are difficult to find due to the computational complexity, so many approximate solutions are proposed instead. These POMDP solvers tend to ge...

متن کامل

On Partially Observable Markov Decision Processes Using Genetic Algorithm Based Q-Learning

As powerful probabilistic models for optimal policy search, partially observable Markov decision processes (POMDPs) still suffer from the problems such as hidden state and uncertainty in action effects. In this paper, a novel approximate algorithm Genetic algorithm based Q-Learning (GAQ-Learning), is proposed to solve the POMDP problems. In the proposed methodology, genetic algorithms maintain ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1999

Solving POMDP by On-Policy Linear Approximate Learning Algorithm

نویسندگان

چکیده

منابع مشابه

Model-Based Online Learning of POMDPs

A (Revised) Survey of Approximate Methods for Solving Partially Observable Markov Decision Processes

Dialogue POMDP components (Part II): learning the reward function

Monitoring plan execution in partially observable stochastic worlds

On Partially Observable Markov Decision Processes Using Genetic Algorithm Based Q-Learning

عنوان ژورنال:

اشتراک گذاری